{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "20ca3e28",
   "metadata": {},
   "source": [
    "# Day 7：入门级模型（概念体验，不求深）\n",
    "\n",
    "> 今日目标：理解“用特征预测标签”的基本流程；跑通 1 个线性回归 + 1 个简单分类；会看基础评估指标；知道模型局限。\n",
    "\n",
    "学习路径：\n",
    "1. 机器学习最小流程概念\n",
    "2. 准备数据与特征构造 (lag / 简单派生)\n",
    "3. 回归：预测 PM2.5 (线性回归)\n",
    "4. 分类：判断是否为高污染日 (二分类)\n",
    "5. 评估指标 (R² / MAE / Accuracy)\n",
    "6. 过拟合直观认识 & 基线对比\n",
    "7. 模型解释初步 (系数)\n",
    "8. 小结\n",
    "9. 课后作业提示"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55a36c03",
   "metadata": {},
   "source": [
    "## 1. 机器学习最小流程概念\n",
    "一句话：用历史“特征”学习一个函数 f(x)≈y，再用来预测新样本。\n",
    "\n",
    "流程 6 步：\n",
    "1. 明确任务（回归 or 分类）\n",
    "2. 准备数据（清洗 / 选取列 / 构造特征）\n",
    "3. 划分训练集 / 测试集 (train_test_split)\n",
    "4. 训练：fit\n",
    "5. 评估：在测试集上算指标\n",
    "6. 解释 & 复盘：是否有效？是否存在风险？\n",
    "\n",
    "今日我们不追求算法全面，只体验流水线。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "047331ab",
   "metadata": {},
   "source": [
    "## 2. 准备数据与特征\n",
    "使用 Day 4/5 的 `air_quality_timeseries.csv`。我们构造：\n",
    "- 特征：同日 PM10 / NO2 / SO2 + 前一日 PM25 (lag1)\n",
    "- 目标 (回归)：当日 PM25\n",
    "- 目标 (分类)：高污染标签 high_flag (PM25 > 55 记为 1) (阈值仅演示)\n",
    "\n",
    "Lag 处理：按 city + date 排序，向下平移 1 行。首行会缺失 → 丢弃。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "2dd828ce",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.microsoft.datawrangler.viewer.v0+json": {
       "columns": [
        {
         "name": "index",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "date",
         "rawType": "datetime64[ns]",
         "type": "datetime"
        },
        {
         "name": "city",
         "rawType": "object",
         "type": "string"
        },
        {
         "name": "province",
         "rawType": "object",
         "type": "string"
        },
        {
         "name": "PM25",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "PM10",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "NO2",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "SO2",
         "rawType": "int64",
         "type": "integer"
        }
       ],
       "ref": "49bbbe58-e913-4153-9267-6c8bdb87be0d",
       "rows": [
        [
         "0",
         "2025-09-01 00:00:00",
         "广州",
         "广东",
         "42",
         "55",
         "19",
         "7"
        ],
        [
         "1",
         "2025-09-02 00:00:00",
         "广州",
         "广东",
         "41",
         "54",
         "18",
         "7"
        ],
        [
         "2",
         "2025-09-03 00:00:00",
         "广州",
         "广东",
         "39",
         "50",
         "17",
         "6"
        ],
        [
         "3",
         "2025-09-04 00:00:00",
         "广州",
         "广东",
         "45",
         "57",
         "20",
         "7"
        ],
        [
         "4",
         "2025-09-05 00:00:00",
         "广州",
         "广东",
         "44",
         "56",
         "19",
         "7"
        ]
       ],
       "shape": {
        "columns": 7,
        "rows": 5
       }
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>city</th>\n",
       "      <th>province</th>\n",
       "      <th>PM25</th>\n",
       "      <th>PM10</th>\n",
       "      <th>NO2</th>\n",
       "      <th>SO2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2025-09-01</td>\n",
       "      <td>广州</td>\n",
       "      <td>广东</td>\n",
       "      <td>42</td>\n",
       "      <td>55</td>\n",
       "      <td>19</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2025-09-02</td>\n",
       "      <td>广州</td>\n",
       "      <td>广东</td>\n",
       "      <td>41</td>\n",
       "      <td>54</td>\n",
       "      <td>18</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2025-09-03</td>\n",
       "      <td>广州</td>\n",
       "      <td>广东</td>\n",
       "      <td>39</td>\n",
       "      <td>50</td>\n",
       "      <td>17</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2025-09-04</td>\n",
       "      <td>广州</td>\n",
       "      <td>广东</td>\n",
       "      <td>45</td>\n",
       "      <td>57</td>\n",
       "      <td>20</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2025-09-05</td>\n",
       "      <td>广州</td>\n",
       "      <td>广东</td>\n",
       "      <td>44</td>\n",
       "      <td>56</td>\n",
       "      <td>19</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        date city province  PM25  PM10  NO2  SO2\n",
       "0 2025-09-01   广州       广东    42    55   19    7\n",
       "1 2025-09-02   广州       广东    41    54   18    7\n",
       "2 2025-09-03   广州       广东    39    50   17    6\n",
       "3 2025-09-04   广州       广东    45    57   20    7\n",
       "4 2025-09-05   广州       广东    44    56   19    7"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.linear_model import LinearRegression, LogisticRegression\n",
    "from sklearn.metrics import r2_score, mean_absolute_error, accuracy_score, classification_report\n",
    "\n",
    "df = pd.read_csv('../data/air_quality_timeseries.csv', parse_dates=['date'])\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "06c5c474",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.microsoft.datawrangler.viewer.v0+json": {
       "columns": [
        {
         "name": "index",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "date",
         "rawType": "datetime64[ns]",
         "type": "datetime"
        },
        {
         "name": "city",
         "rawType": "object",
         "type": "string"
        },
        {
         "name": "province",
         "rawType": "object",
         "type": "string"
        },
        {
         "name": "PM25",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "PM10",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "NO2",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "SO2",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "PM25_lag1",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "high_flag",
         "rawType": "int64",
         "type": "integer"
        }
       ],
       "ref": "b95b99dc-d3eb-4034-8d72-5ba4496ac99e",
       "rows": [
        [
         "10",
         "2025-09-01 00:00:00",
         "上海",
         "上海",
         "45",
         "62",
         "28",
         "10",
         null,
         "0"
        ],
        [
         "11",
         "2025-09-02 00:00:00",
         "上海",
         "上海",
         "47",
         "63",
         "29",
         "10",
         "45.0",
         "0"
        ],
        [
         "12",
         "2025-09-03 00:00:00",
         "上海",
         "上海",
         "43",
         "60",
         "27",
         "9",
         "47.0",
         "0"
        ],
        [
         "13",
         "2025-09-04 00:00:00",
         "上海",
         "上海",
         "46",
         "61",
         "28",
         "10",
         "43.0",
         "0"
        ],
        [
         "14",
         "2025-09-05 00:00:00",
         "上海",
         "上海",
         "44",
         "59",
         "27",
         "9",
         "46.0",
         "0"
        ],
        [
         "5",
         "2025-09-01 00:00:00",
         "北京",
         "北京",
         "60",
         "80",
         "35",
         "12",
         null,
         "1"
        ],
        [
         "6",
         "2025-09-02 00:00:00",
         "北京",
         "北京",
         "65",
         "85",
         "37",
         "13",
         "60.0",
         "1"
        ],
        [
         "7",
         "2025-09-03 00:00:00",
         "北京",
         "北京",
         "58",
         "78",
         "34",
         "11",
         "65.0",
         "1"
        ]
       ],
       "shape": {
        "columns": 9,
        "rows": 8
       }
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>city</th>\n",
       "      <th>province</th>\n",
       "      <th>PM25</th>\n",
       "      <th>PM10</th>\n",
       "      <th>NO2</th>\n",
       "      <th>SO2</th>\n",
       "      <th>PM25_lag1</th>\n",
       "      <th>high_flag</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>2025-09-01</td>\n",
       "      <td>上海</td>\n",
       "      <td>上海</td>\n",
       "      <td>45</td>\n",
       "      <td>62</td>\n",
       "      <td>28</td>\n",
       "      <td>10</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>2025-09-02</td>\n",
       "      <td>上海</td>\n",
       "      <td>上海</td>\n",
       "      <td>47</td>\n",
       "      <td>63</td>\n",
       "      <td>29</td>\n",
       "      <td>10</td>\n",
       "      <td>45.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>2025-09-03</td>\n",
       "      <td>上海</td>\n",
       "      <td>上海</td>\n",
       "      <td>43</td>\n",
       "      <td>60</td>\n",
       "      <td>27</td>\n",
       "      <td>9</td>\n",
       "      <td>47.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>2025-09-04</td>\n",
       "      <td>上海</td>\n",
       "      <td>上海</td>\n",
       "      <td>46</td>\n",
       "      <td>61</td>\n",
       "      <td>28</td>\n",
       "      <td>10</td>\n",
       "      <td>43.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>2025-09-05</td>\n",
       "      <td>上海</td>\n",
       "      <td>上海</td>\n",
       "      <td>44</td>\n",
       "      <td>59</td>\n",
       "      <td>27</td>\n",
       "      <td>9</td>\n",
       "      <td>46.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2025-09-01</td>\n",
       "      <td>北京</td>\n",
       "      <td>北京</td>\n",
       "      <td>60</td>\n",
       "      <td>80</td>\n",
       "      <td>35</td>\n",
       "      <td>12</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>2025-09-02</td>\n",
       "      <td>北京</td>\n",
       "      <td>北京</td>\n",
       "      <td>65</td>\n",
       "      <td>85</td>\n",
       "      <td>37</td>\n",
       "      <td>13</td>\n",
       "      <td>60.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2025-09-03</td>\n",
       "      <td>北京</td>\n",
       "      <td>北京</td>\n",
       "      <td>58</td>\n",
       "      <td>78</td>\n",
       "      <td>34</td>\n",
       "      <td>11</td>\n",
       "      <td>65.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date city province  PM25  PM10  NO2  SO2  PM25_lag1  high_flag\n",
       "10 2025-09-01   上海       上海    45    62   28   10        NaN          0\n",
       "11 2025-09-02   上海       上海    47    63   29   10       45.0          0\n",
       "12 2025-09-03   上海       上海    43    60   27    9       47.0          0\n",
       "13 2025-09-04   上海       上海    46    61   28   10       43.0          0\n",
       "14 2025-09-05   上海       上海    44    59   27    9       46.0          0\n",
       "5  2025-09-01   北京       北京    60    80   35   12        NaN          1\n",
       "6  2025-09-02   北京       北京    65    85   37   13       60.0          1\n",
       "7  2025-09-03   北京       北京    58    78   34   11       65.0          1"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 按城市 + 日期排序，并创建 lag1 特征\n",
    "df = df.sort_values(['city','date']).copy()\n",
    "df['PM25_lag1'] = df.groupby('city')['PM25'].shift(1)\n",
    "# 分类标签\n",
    "df['high_flag'] = (df['PM25'] > 55).astype(int)\n",
    "df.head(8)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "458d6879",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "原始行数: 30  可用行数: 24\n"
     ]
    },
    {
     "data": {
      "application/vnd.microsoft.datawrangler.viewer.v0+json": {
       "columns": [
        {
         "name": "index",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "date",
         "rawType": "datetime64[ns]",
         "type": "datetime"
        },
        {
         "name": "city",
         "rawType": "object",
         "type": "string"
        },
        {
         "name": "province",
         "rawType": "object",
         "type": "string"
        },
        {
         "name": "PM25",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "PM10",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "NO2",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "SO2",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "PM25_lag1",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "high_flag",
         "rawType": "int64",
         "type": "integer"
        }
       ],
       "ref": "a77f9583-7d9e-40f9-9132-acd696950b13",
       "rows": [
        [
         "11",
         "2025-09-02 00:00:00",
         "上海",
         "上海",
         "47",
         "63",
         "29",
         "10",
         "45.0",
         "0"
        ],
        [
         "12",
         "2025-09-03 00:00:00",
         "上海",
         "上海",
         "43",
         "60",
         "27",
         "9",
         "47.0",
         "0"
        ],
        [
         "13",
         "2025-09-04 00:00:00",
         "上海",
         "上海",
         "46",
         "61",
         "28",
         "10",
         "43.0",
         "0"
        ],
        [
         "14",
         "2025-09-05 00:00:00",
         "上海",
         "上海",
         "44",
         "59",
         "27",
         "9",
         "46.0",
         "0"
        ],
        [
         "6",
         "2025-09-02 00:00:00",
         "北京",
         "北京",
         "65",
         "85",
         "37",
         "13",
         "60.0",
         "1"
        ]
       ],
       "shape": {
        "columns": 9,
        "rows": 5
       }
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>city</th>\n",
       "      <th>province</th>\n",
       "      <th>PM25</th>\n",
       "      <th>PM10</th>\n",
       "      <th>NO2</th>\n",
       "      <th>SO2</th>\n",
       "      <th>PM25_lag1</th>\n",
       "      <th>high_flag</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>2025-09-02</td>\n",
       "      <td>上海</td>\n",
       "      <td>上海</td>\n",
       "      <td>47</td>\n",
       "      <td>63</td>\n",
       "      <td>29</td>\n",
       "      <td>10</td>\n",
       "      <td>45.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>2025-09-03</td>\n",
       "      <td>上海</td>\n",
       "      <td>上海</td>\n",
       "      <td>43</td>\n",
       "      <td>60</td>\n",
       "      <td>27</td>\n",
       "      <td>9</td>\n",
       "      <td>47.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>2025-09-04</td>\n",
       "      <td>上海</td>\n",
       "      <td>上海</td>\n",
       "      <td>46</td>\n",
       "      <td>61</td>\n",
       "      <td>28</td>\n",
       "      <td>10</td>\n",
       "      <td>43.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>2025-09-05</td>\n",
       "      <td>上海</td>\n",
       "      <td>上海</td>\n",
       "      <td>44</td>\n",
       "      <td>59</td>\n",
       "      <td>27</td>\n",
       "      <td>9</td>\n",
       "      <td>46.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>2025-09-02</td>\n",
       "      <td>北京</td>\n",
       "      <td>北京</td>\n",
       "      <td>65</td>\n",
       "      <td>85</td>\n",
       "      <td>37</td>\n",
       "      <td>13</td>\n",
       "      <td>60.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date city province  PM25  PM10  NO2  SO2  PM25_lag1  high_flag\n",
       "11 2025-09-02   上海       上海    47    63   29   10       45.0          0\n",
       "12 2025-09-03   上海       上海    43    60   27    9       47.0          0\n",
       "13 2025-09-04   上海       上海    46    61   28   10       43.0          0\n",
       "14 2025-09-05   上海       上海    44    59   27    9       46.0          0\n",
       "6  2025-09-02   北京       北京    65    85   37   13       60.0          1"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 丢弃 lag 缺失行\n",
    "model_df = df.dropna(subset=['PM25_lag1']).copy()\n",
    "print('原始行数:', df.shape[0], ' 可用行数:', model_df.shape[0])\n",
    "model_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8bb68cc3",
   "metadata": {},
   "source": [
    "### 小练习 1\n",
    "1. 思考还有哪些简单可加特征？(例如 PM 差值：PM10-PM25)\n",
    "2. 尝试新增列 PM_diff = PM10 - PM25 并查看是否与 PM25 有相关。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "454edb4f",
   "metadata": {},
   "source": [
    "## 3. 回归：线性回归预测 PM2.5\n",
    "目标：预测 `PM25`。特征：PM10, NO2, SO2, PM25_lag1。\n",
    "评估：R²（越接近 1 越好）+ MAE（越小越好）。\n",
    "基线思路：如果总是预测训练集均值，MAE 大概多少？(可作为对照)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "01c75d3b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "R^2: 0.982\n",
      "MAE: 0.58\n",
      "基线 MAE (预测均值): 11.62\n"
     ]
    }
   ],
   "source": [
    "features = ['PM10','NO2','SO2','PM25_lag1']\n",
    "X = model_df[features]\n",
    "y = model_df['PM25']\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n",
    "reg = LinearRegression()\n",
    "reg.fit(X_train, y_train)\n",
    "y_pred = reg.predict(X_test)\n",
    "r2 = r2_score(y_test, y_pred)\n",
    "mae = mean_absolute_error(y_test, y_pred)\n",
    "print('R^2:', round(r2,3))\n",
    "print('MAE:', round(mae,2))\n",
    "# 基线：用训练集均值预测\n",
    "baseline = np.full_like(y_test, fill_value=y_train.mean())\n",
    "mae_baseline = mean_absolute_error(y_test, baseline)\n",
    "print('基线 MAE (预测均值):', round(mae_baseline,2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35aef664",
   "metadata": {},
   "source": [
    "### 解释：\n",
    "- 如果模型 MAE 比基线显著降低 → 说明利用特征有效。\n",
    "- R² 负值说明模型比“直接用均值”还差。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a5e5f41",
   "metadata": {},
   "source": [
    "## 4. 分类：高污染日判别 (Logistic Regression)\n",
    "标签：high_flag = 1 (PM25>55)。特征可与回归相同。\n",
    "评估：Accuracy。说明：数据量少 → 结果仅演示。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "cfda94d8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy: 1.0\n",
      "分类报告:\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0      1.000     1.000     1.000         5\n",
      "           1      1.000     1.000     1.000         3\n",
      "\n",
      "    accuracy                          1.000         8\n",
      "   macro avg      1.000     1.000     1.000         8\n",
      "weighted avg      1.000     1.000     1.000         8\n",
      "\n"
     ]
    }
   ],
   "source": [
    "X_class = model_df[features]\n",
    "y_class = model_df['high_flag']\n",
    "Xc_train, Xc_test, yc_train, yc_test = train_test_split(X_class, y_class, test_size=0.3, random_state=42, stratify=y_class)\n",
    "clf = LogisticRegression(max_iter=500)\n",
    "clf.fit(Xc_train, yc_train)\n",
    "yc_pred = clf.predict(Xc_test)\n",
    "acc = accuracy_score(yc_test, yc_pred)\n",
    "print('Accuracy:', round(acc,3))\n",
    "print('分类报告:')\n",
    "print(classification_report(yc_test, yc_pred, digits=3))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "484cfd60",
   "metadata": {},
   "source": [
    "注意：若标签极不平衡，Accuracy 可能失真。以后需看 Precision / Recall / F1。课程限于入门，仅展示概念。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc3eb4c5",
   "metadata": {},
   "source": [
    "### 小练习 2\n",
    "1. 改变划分 random_state 看结果是否稳定。\n",
    "2. 若全部预测 0，Accuracy 约是多少？(计算标签占比)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d53996a7",
   "metadata": {},
   "source": [
    "## 5. 模型系数 (线性回归 & Logistic)\n",
    "系数方向：正 → 特征增大，预测值增大；负 → 反向。大小受量纲影响→ 正规做法需标准化 (本课略)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "46e96be5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.microsoft.datawrangler.viewer.v0+json": {
       "columns": [
        {
         "name": "index",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "feature",
         "rawType": "object",
         "type": "string"
        },
        {
         "name": "linear_coef",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "logistic_coef",
         "rawType": "float64",
         "type": "float"
        }
       ],
       "ref": "3d2da3cd-c9b0-4483-94cf-9eab60386c29",
       "rows": [
        [
         "0",
         "PM10",
         "1.0069907098485456",
         "1.052641167850804"
        ],
        [
         "1",
         "NO2",
         "-0.48659618324066284",
         "-0.09585329403134264"
        ],
        [
         "2",
         "SO2",
         "0.3045475905656496",
         "-0.1942904937853275"
        ],
        [
         "3",
         "PM25_lag1",
         "-0.08332593127968851",
         "-0.18025561148993968"
        ]
       ],
       "shape": {
        "columns": 3,
        "rows": 4
       }
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>feature</th>\n",
       "      <th>linear_coef</th>\n",
       "      <th>logistic_coef</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>PM10</td>\n",
       "      <td>1.006991</td>\n",
       "      <td>1.052641</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NO2</td>\n",
       "      <td>-0.486596</td>\n",
       "      <td>-0.095853</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>SO2</td>\n",
       "      <td>0.304548</td>\n",
       "      <td>-0.194290</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>PM25_lag1</td>\n",
       "      <td>-0.083326</td>\n",
       "      <td>-0.180256</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     feature  linear_coef  logistic_coef\n",
       "0       PM10     1.006991       1.052641\n",
       "1        NO2    -0.486596      -0.095853\n",
       "2        SO2     0.304548      -0.194290\n",
       "3  PM25_lag1    -0.083326      -0.180256"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "coef_table = pd.DataFrame({\n",
    "    'feature': features,\n",
    "    'linear_coef': reg.coef_,\n",
    "    'logistic_coef': clf.coef_[0]\n",
    "})\n",
    "coef_table"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8763ca9",
   "metadata": {},
   "source": [
    "### 解释提示：\n",
    "- 仅在所有其他特征固定时的线性近似影响。\n",
    "- 不代表因果关系。\n",
    "- 多特征相关时系数不稳定。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fdb6184d",
   "metadata": {},
   "source": [
    "## 6. 过拟合直观认识\n",
    "过拟合：模型在训练集表现很好，在测试集差。今天用的线性模型偏简单，不易严重过拟合。\n",
    "概念记住即可：需要平衡“拟合能力”与“泛化能力”。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1955094a",
   "metadata": {},
   "source": [
    "### 小练习 3\n",
    "1. 尝试只用 PM25_lag1 作为唯一特征再训练回归，比较 MAE。\n",
    "2. 思考：简单模型结果与多特征相比差多少？是否值得引入更多复杂特征？"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5bd4847a",
   "metadata": {},
   "source": [
    "## 7. 快速可视化：真实 vs 预测 (回归)\n",
    "仅用于直观感受。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "f4811036",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeoAAAGGCAYAAAC0W8IbAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjUsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvWftoOwAAAAlwSFlzAAAPYQAAD2EBqD+naQAASpRJREFUeJzt3Xd4VGX+/vH3pJNKQkhIIAQQKQJijIIKIqCiKBhXQBGUIlj2txbE1YWFL7Aq4loW7A2QoiwKKyAiFoogKLqUgCAtlNAhhfRkksmc3x9nMxhSSCDJTJL7dV25ds6ZMzOfjLPceZ55isUwDAMRERFxSW7OLkBERETKpqAWERFxYQpqERERF6agFhERcWEKahERERemoBYREXFhCmoREREXpqAWERFxYQpqkSpy+vRpZ5fg8lJSUpxdgkito6AWAVatWsUDDzxwSc/x97//nRUrVlRRRdUrKSmJxMRE0tLSHD/r1q3j6aefJjs7m/T09BKPKSws5MknnyQhIcFxbtasWRw6dAgo+YfK6dOnSUtLK3bu6aef5p///Ocl179x40YKCgoASEhIIDc3t8KP3bRpExs2bCj3munTp7Nv374KPV9iYiLHjh2r8OuLVJaHswsQqUlZWVl4enri7e1d7HxGRgZt27Ytcf24ceMIDw/n6aefLnZ+7dq19OrVq9i5UaNG8f7773PnnXcWO79582by8vLo3r17Ff0Wly4lJYXHH3+cRo0aYbFYAFi5ciVRUVE88sgjDB06lODgYOx2O9u2beOaa65xPHbNmjW0bt0agDvvvJPrrruOX375hYkTJ7Jv3z7H8+3du5e+ffvy7rvvcvbsWSIiIvD29i7xvlXWW2+9xdVXX42npycAu3fv5uabb6Zly5YVevzx48eJjY0t97/H3LlzueWWWzh9+jSBgYE0aNCg2P2rVq0iKCiIa6+9lujoaD744APat29Pjx49Lv4XEymDglrqlW3btjF27Fj8/PyKnT958iSenp6sXr26xGPc3NwYPXo0AQEBAOTm5vLYY48RERFR7DrDMLDZbPTs2bPY+X379mG329m9ezfBwcFV+wtdpHbt2rFq1SrWr19Pt27dmDVrFt27d+cvf/kLFouFlStXEh4ezrp160hISGDUqFGkp6eTlZVF06ZNHc/TpEkT7r//fnJzc8nPz+f11193hPrLL79Mu3btSE5O5qqrrmLv3r24u7vj5eV10XVPnz6dzp07061bN8e5Bg0a8OCDD/Liiy9W6DnefvttGjZsWOzc4sWLWbVqFe+//77jOdu3b8/MmTOZM2cO/v7+xa4/fPgwgwYN4tprrwXg0Ucf5YUXXsDf35+rr776on8/kdIoqKVeufHGG/nvf/8LwPfff8+tt94KwEsvvcSNN97IjTfeCMChQ4dISEhw3P9H7733Hs888wyPPPJIifvi4+Np3rw5ISEhjnNFIeIqIf1HzZs35+abb2bIkCE8/vjjjvOvvPIKv/76K2fOnCEiIoL58+dz+PBhWrZsyZo1a7BYLGRkZBAYGMi0adP4+OOP8fX1BWDevHn4+Pg4nsvHx4fOnTvTqFGjS6p1586dHDx4sETvxvlOnz5NeHh4sXMJCQmOXoDSGIZRrD53d3cAHnvsMR577LES17/44otcddVVxc6NGzeOAQMGsGTJEsfjRaqCglrqreHDh9OmTRvH8Xfffee4nZeXx6lTp9i5c2ex1tSJEyeYMWMG8fHxANjtdiZMmEDv3r259dZb+eCDDwgKCuLll18GICcnh1GjRhEdHV0zv1QltWjRgpEjR7Jw4cJif3g0aNCAyZMnc8sttzjOLVy4kFOnTjm6tt999138/Px44oknSE1NdYTTrl27iIqKcjzOw8PD0U19KV577TWefPLJC1730EMPkZ2dXezcxo0b+eSTT7jvvvtKfYynp6ejx+RieXp60qVLF5YsWcLAgQMv6blE/khBLfVWREQEr732Wqn3nTx5kk8//bREl2dkZCQHDhzA09OTwsJC3N3def755+nSpQv/+te/8PHxKfYdrK+vb4mucFcwceJE1q5dS0BAAHl5eaxbt47bb7+do0ePMnbs2GLXdu3alXfeeafEc4wdO5bo6Gji4uIICgpynPf396dFixbs3LmzyuotLCxk+/btFepWtlgsLF68mNDQUMe56667jnvvvbdSr1n0B0llDBw4kMmTJyuopUpp1LfUWxfzDzHgaB2+++67LFiwAE9PTz755BOuuOIKvL29HV3AF2vnzp20atWKV155xXFu9+7dDBo0yDHSGeDVV19l8uTJvPHGG7zyyissWrSIim4vP3nyZNatW8c333xD69at+e233xg3bhydOnUqEWhnz54lNja2xHN4eXnxzTffsGzZMm677TbHeR8fnyppQf9RQkJCie7syqrMf2+73Y67uzuZmZklRq6Xp3Xr1mzbtu0iqhMpm1rUUm9lZGTw17/+tdT78vLyaNGiRbmPf+KJJ+jfvz95eXk89NBDgDma+vwRwpXVsWNHnnrqKfLz8x3njh07xquvvuoIwNWrV3Py5En+9a9/Oa4ZOXIk/fv3L/b9cFlKC9I1a9YwZ86cEo8PDw8vM+Q6d+7Mhx9+6Ojqtlgs5OTkOJ7DMAzsdvsF67mQ5OTkEoP3ynIxr3f+HzhF3fhnzpzh3nvvpWHDhsXeg8OHDzNjxowSz+PhoX9SperpUyX1VkhICD/88EOp9x0+fJhx48Zd8Dlef/11xzxiML/DbtKkSbFrVq9ezalTpxg6dGiFaxs6dCi33norzz77LGAObvvjwLaGDRuyZs0a1q9fT5cuXfDx8eGDDz64pBHVjz76aLGQLgqmP7biz5eamup4TGFhIW5ubmRlZREUFITdbsdqtVJQUHDJYR0cHExOTk6Frr2Y1zq/xqLAveyyy9iyZUuJ68sbYX4p/w1ESqOglnorNTW1zO+Py2pRHz58mFtuuYXIyEjc3M59czRt2jQAtm7dypAhQ4rdt3v3bqxWK926dbtgK71IaGgoLVq0YPPmzbRs2bLEiOnY2Fjeeust5s+fz1NPPYWHhwePPvooo0ePrtDzl+aP067sdjsWi6VYi7goeP9o6dKljulYBQUFeHh4cODAAZo2bUp+fj5ZWVmOqVuXonnz5pw8ebJC1+bn5xMXF1es1+D3338nJyenzK8liv6gALDZbBfdK5Kbm1ti6pfIpVJQS71V0Rb15MmT+fOf/0yTJk0cg6RK614+dOgQ/+///T9WrlxZJfUNHz6cuXPnEhsby+DBg4vdd+DAgWLTyYq6aNu2bes4dylsNhvu7u7s3LmTyMhIwAzirKwsxzWvvPIKL7/8MmvWrAEgICCAsLAwjh07RuPGjfH29ubyyy8nLy+vxFSmyvL398ff35+srKwSA/zO169fPx5//PFiLdvp06eXO2UqJyfH8bw7duygc+fOF1XnDz/8wB133HFRjxUpi4Ja6o29e/dy7733OuYzHzhwoMwWtdVq5ejRo3Tt2pXt27ezefNmx/KgZX0HPH78eGJiYpg3bx533XXXJbes7rzzTsaPH0+7du1KvOYbb7zBgw8+6FhwIywsjNatW1dZt2teXh4NGjRg+fLlDBgwgG+++YaTJ086QhvgueeeY9asWY5BXtOnT2fXrl00btyY//u//2PMmDGOnoDSvs+trIceeojFixczYsSIEvf9sQfj/FHrwAXnXg8ePNgxcr28aVx/VFoX+5IlS5gyZcoFHytSGQpqqTfatm3Lzz//XKz7c//+/bzwwgt8+OGH+Pj4cObMGRITE+ncuXOFQ89ms/Hkk0/St29fhg8fzq5du3jsscfw8/NjwoQJtGrV6qLq9fT05LbbbqNr164l7ps0aRLvv/8+q1atwsvLi5MnT9KlS5dSr70YTz31FH5+fqxbt47ly5eTnp7OG2+8waBBg4pd17hx42Ih+dFHH/HGG2/g4+ND//79+fjjj0tdmvViDBw4kMcee4y77rqr2IIyeXl5zJs3j/Xr11foeY4fP87kyZOLnSsK6YSEBA4ePEjXrl3ZsGEDf/nLX2jcuHGJ5zh06BAvvfRSsXObNm2ibdu2xf6YEakShkg99emnnxodOnQw3nrrLePgwYOGYRhGYWGh8c033xiDBw827rrrLmPFihXlPse6deuMkSNHGuvWrStx37Jly4wmTZoY7777brXUf6l+++03Y+vWrUa3bt2MXbt2FbsvPj7eGDlypHHixAnHuaysLOOaa64xPvroI8e5kJAQY+fOnYZhGMYnn3xifPbZZ477vvzyS6N9+/aO+6ZOnWp06tSp2HNWVnp6ujFu3DgjLy/Pce7zzz83Jk+eXOHneOutt4zZs2eXOJ+YmGjcfvvtxrFjx4q9Xmn++te/FnsfEhMTjX/84x8VrkGkMtSilnolPz+f5cuX891339G2bVt++eWXYut+u7m5cdttt3Hbbbfx66+/Mnz4cE6cOFFskFZ2djbLli0jMTGRmJgYZs+eXepr3XXXXXTs2JGePXty9913V3h6UU3x9fXlhRdeYP/+/Y6V044cOcLnn39OeHg4M2fOLNZa9vPzY/78+Xz22WeAOco7MDCQs2fPsnDhQvz9/YmLi3Nc379/f3bt2gWYo9jnzJnDtddee0nvQ2BgIBMmTGDWrFk88sgjeHh4kJeXV6EpaUUsFkuxbuusrCxmz57Nnj17mDt3LmFhYcVerzT9+/d3fI997Ngx1qxZw6RJky7ytxIpn8UwKrhCgkgdsGnTJpo1a0azZs0qdH16ejq//vqrY2pUYWEh//nPf7jllluKdb+W5+DBgzRr1swlp+1kZ2dTUFDg+D5906ZNdOnSpVhAl8cwDCwWC9nZ2SU2Ojlf0QC1i11opqzXLm80d2nOr3Xnzp00bNiwwp+J89nt9gq/XyIXQ0EtIiLiwvRnoIiIiAtTUIuIiLgwBbWIiIgLU1CLiIi4MAW1iIiIC6tT86jtdjsnTpwgICCgyqaAiIiIVDXDMMjMzCyxwU9p6lRQnzhxwrEvroiIiKs7evToBefw16mgDggIAMxfvKwVhURERJwtIyODqKgoR26Vp04FdVF3d2BgoIJaRERcXkW+ptVgMhERERemoBYREXFhCmoREREXpqAWERFxYQpqERERF6agFhERcWEKahERERemoBYREXFhCmoREREXpqAWERG5kIICMAynvLSCWkREpDzHjsHUqfD99055eQW1iIhIaQwD1qyBadPg5ElYu9ZsWdewOrUph4iISJWxWuG778BmgyuvhGHDwNOzxstQUIuIiJTGxwdGjTK7vnv2hArsdFUdFNQiIiJgtpyXLIHISOjWzTx3+eXmjxMpqEVERE6dgpkz4ehR8PIyu7oDApxdFaCgFhGR+swwYMMG+Owzc6CYnx+MGOEyIQ1ODuo5c+bg5+dHbm4uISEh9OvXj7S0NF577TUiIiJIS0tj/PjxuLlpcLqIiFw8u91g35lM0nMKCPL1pE1YAG65OTB/PmzbZl7Uvr0Z0g0bOrPUEpwW1J9++inXX389bdu2BWDIkCH069ePKVOmMHHiREJDQ9myZQvvvPMOTzzxhLPKFBGRWm5LYipzf0ok4UwW+bZCvDzcaRfsxbOb/k1EYS64u8Pdd8OttzptwFh5nNZU3bdvnyOkARYsWEBKSgq5ubmEhoYCEBsby/r1651VooiI1HJbElOZumI3O4+nE+jjQbNgXwJ9PNielMucwiac9A6Av/0N+vRxyZAGJ7Wod+/ejZeXF8uWLWPjxo1kZmYybtw4Dhw4QKdOnYpdGxISQnJysiO8RUREKsJuN5j7UyJpOQW0aORLQGYaFqtBZmAIvl7urG7dhaRwP16Lau7Sq385JagPHTrE1q1bueOOO4iLiyM5OZlRo0YxePBgQkJCil3bqFEjTp06VWpQW61WrFar4zgjI6PaaxcRkdph35lMEs5kERbgzWUHd9J107dkBIaw8o5h2N09CA1swJ6z+ew7k0m7JoHOLrdMTvkjIjs7m0GDBhETEwNAaGgozZs3x263k5qaWuzalJQUmjRpUurzTJs2jaCgIMdPVFRUtdcuIiK1Q3pOAeTmcOumFXT78Ss8CgqweXjilW828Hw83cm3FZrXuTCnBHWrVq04duxYsXM+Pj60a9eO3377rdj51NTUMru9x48fT3p6uuPn6NGj1VaziIjULqFJxxi25lOiD+zEcLOwPeZGvrttKHkN/ADIKzAHlgX51vyyoJXhlK7v2NhYPvroI2w2Gx4eZgkpKSnExMTg4+NDamoqISEhxMfH07179zKfx9vbG29v75oqW0REagO7Hb75hlZffsmpwmwSPf3Y3mcAyeHnel0NwyApy0rHyCDahLnOnOnSOG161nPPPceUKVOIiooiMzOTZ555Bjc3N6ZMmcL06dNp2rQpycnJjBs3zlkliohIbWQYsGMHFsOgaZ8evOXdiaRCNxpbbfh4upNXUEhSlpWgBp4MuyEaNzfXHO1dxGIYTtoJuxpkZGQQFBREeno6gYGuOzBARESqgWGcm2KVnAwJCdC1K1uOnC0xj/ryMH+G3RBNbHRI+c9ZTSqTV1pCVEREajerFRYtggYNYMAA81xoqPkDxEaHEBMVXHJlMhdvSRdRUIuISO119Ch89BGcPg1ubnDTTY6A/iM3N4tLT8Eqj4JaRERqH8OA1avhiy+gsNBcn3vUqFJDurZTUIuISO2SkQEffwy//24ex8TAgw+aO1/VQQpqERGpPex2eOUVSEoCT0+47z7o3t1l1+muCgpqERGpPdzcoF8/+P57GD0aIiKcXVG1U1CLiIhrO3ECcnPhssvM465d4dprze0p6wEFtYiIuCbDgHXrYPFi8/vn//s/8Pc3u7nrSUiDglpERFxRVhbMnQs7dpjHzZqZwV0PKahFRMS17N5tjupOTwcPD3MRk1696vSAsfIoqEVExDXY7bBkCXz3nXkcEWEOGGvWzLl1OZmCWkREXIPFAmfPmrd79IBBg8DLy7k1uQAFtYiIOI9hgM1mzom2WGDoUOjSBa680tmVuQwFtYiIOEdODnz6qRnUjz1mBnWDBgrp8yioRUSk5u3fD7NnQ2qquYjJsWMQFeXsqlySglpERGqO3Q4rVpg/hgFhYeZmGgrpMimoRUSkZiQnw6xZcPCgeXzDDeZa3T4+zq3LxSmoRUSk+hkGvPee2cXt4wMPPGAuAyoXpKAWEZHqZ7HAkCHmPOmRI6FRI2dXVGu4ObsAERGpow4fhs2bzx1fdhk884xCupLUohYRkaplt5uriy1bZm6e0bTpue0o6+kyoJdCQS0iIlUnLc2cdrV3r3l89dUQFOTUkmo7BbWIiFSN+HiYNw+ys8HbGwYPhuuvVyv6EimoRUTk0hgGLFwIP/xgHkdHm3Ojw8OdWlZdoaAWEZFLY7GAv795+7bb4K67zO0ppUronRQRkcozDLOLuyig77wTOnSAVq2cW1cdpOlZIiJSORkZ8NZbMGOGuaEGmOt1K6SrhVrUIiJScTt3wpw5kJlpbk15+DC0bu3squo0BbWIiFxYQYG5qtjq1eZx06YwejRERjq3rnpAQS0iIuU7eRJmzjTX6Qbo1QsGDDBb1FLtFNQiIlK+hQvNkPb3hxEjoFMnZ1dUryioRUSkfA8+CP/5j7mAiVYZq3Ea9S0iIsXt2QMrV547Dg2FRx9VSDuJWtQiImIqLIQvv4RvvzXnSbdsCe3aObuqek9BLSIicOYMzJplTrcCuPFGM6jF6RTUIiL1mWHApk3w73+D1Qq+vuZ30ldf7ezK5H8U1CIi9dmCBbB+vXm7TRt46CEIDnZuTVKMBpOJiNRnl19uLv8ZFwdPP62QdkFqUYuI1Cd2OyQnQ1iYedylC7Roce5YXI5a1CIi9UVqKrz+Orz6qrlWdxGFtEtTi1pEpD7YvBk++QRyc8HHB44f19SrWkJBLSJSl1mt8NlnsHGjedyyJYwaBY0bO7cuqTAFtYhIXZWYaG6mceYMWCzQty/06wfu7s6uTCpBQS0iUletXWuGdHCwOe2qTRtnVyQXQUEtIlJXDR5sfh/dvz/4+Tm7GrlIGvUtIlJX7NgBc+eaq42BGdKDByukazmntagnTZrE1q1bHce9e/dm7NixZZ4XEZEyFBTA4sXwww/mcbt20LWrU0uSqmMxjKI/vWrWhg0b6N69e4XPV0RGRgZBQUGkp6cTGBh4qSWKiLi+48fNAWMnTpjHt94Kd98NHvpm05VVJq/0X1JEpDYyDLMFvXgx2GwQGAgjR8IVVzi7MqliTgvqlJQU3nzzTY4fP47NZmP8+PGEhoaWeV5ERP5g4cJzXd2dOsHw4RAQ4NSSpHo4Lai7dOlCXFwcANu2bWPkyJEsX768zPOlsVqtWK1Wx3FGRkb1Fy4i4gq6doWff4Y//Ql69jTnSUud5LRR3xEREY7bMTExREZGcuTIkTLPl2batGkEBQU5fqKioqq9bhERp7DZ4MCBc8etWsG0adCrl0K6jnNKUKemppYI3/DwcJKSkko9n5KSUurzjB8/nvT0dMfP0aNHq61mERGnOX0aXn4Zpk8/N2gMNO2qnnBKUO/YsYM5c+Y4jm02G/Hx8SQnJ5d6/ooyBkd4e3sTGBhY7EdEpM4wDNiwAV58EY4eBS8v0Fd89Y7TpmctXbqUlJQUDMMgISGBgQMHcs0115R5viI0PUtE6ozsbHO3q6J1Jdq1M0d1N2zo1LKkalQmr5wW1NVBQS0idcK+fTB7Npw9C25u5oCxW2/Vd9F1iOZRi4jUZvv3myEdFgajR0N0tLMrEidSUIuIuALDONdi7tvXXFmsZ0/w9nZqWeJ82pRDRMTZfv0VXn/dXLMbzO7u225TSAugFrWIiPPk5cG//w2bNpnH69fDzTc7tyZxOQpqERFnOHTI3EwjOdns8u7Xz1y8ROQ8CmoRkZpkt8O338KXX5q3GzWCUaPgssucXZm4KAW1iEhN+s9/YNUq8/a118KQIeDr69yaxKUpqEVEalLv3rBli7lndNeumhstF6SgFhGpTlYr/PYbFK2w2KiRuSSoh/75lYrRJ0VEpLocPQoffWRuquHjAx07mucV0lIJ+rSIiFQ1w4DVq+GLL6Cw0Fyf28vL2VVJLaWgFhGpShkZ8PHH8Pvv5vFVV8GwYdqSUi6aglpEpKrs2mWGdGYmeHrCffdB9+4aMCaXREEtIlJVcnLMkG7WzNxMIyLC2RVJHaCgFhG5FDbbucFh115rfj8dE2O2qEWqgDblEBG5GIYBP/wAkyaZ30sX6dJFIS1VSkEtIlJZWVnw3nvmhhopKeZmGiLVRF3fIiKVsWcPzJ4N6elml/c995irjYlUEwW1iEhF2GzmRhrffWd2ezdpYg4Yi4pydmVSxymoRUQq4ptvzF2vAHr0gEGDtIiJ1AgFtYhIRdxyi7lm9+23m6O6RWqIBpOJiJQmJ+dcNzeYa3WPG6eQlhqnFrWIyPkSEmDWLEhNNada9eplntcKY+IECmoRkSJ2O6xYYf4YBoSGQosWzq5K6jkFtYgImPOhZ82CAwfM4+uug/vvN7u8RZxIQS0ismOHGdJ5eWYwDx1qrjAm4gIU1CIiAQGQnw+tWsGoUWaXt4iLUFCLSP2UnX1uj+iWLWHsWLjsMnDTZBhxLfpEikj9Yrebi5f8/e9w/Pi585dfrpAWl6RPpYjUH2lpMGMGLFlifh/966/OrkjkgtT1LSL1Q3w8zJtndnl7e8PgwXD99c6uSuSCFNQiUrfl58OiRee2ooyONgeMhYc7ty6RClJQi0jd9tNP50K6Tx+IizO3pxSpJfRpFZG6rUcPc0nQbt2gfXtnVyNSaRpMJiJ1S2YmLFwIBQXmsZubuW+0QlpqKbWoRaTu2LUL5syBjAwzoO+919kViVwyBbWI1H42mznlatUq8zgyErp3d25NIlVEQS0itdupUzBzJhw9ah736gUDBpjbU4rUAZUK6tOnTxP+hykN69evZ+7cuaSnp9OzZ08ef/zxKi9QRKRMO3bAhx+a30f7+8Pw4XDllc6uSqRKVWow2SOPPOK4vXTpUmbPns3UqVNZuHAhUVFRTJ06tcoLFBEpU1SU2XJu3x4mTVJIS51UqRa13W533P7mm2+YNWsW7u7uAMTFxZGYmFi11YmInC8pCRo3Nm8HB8O4cRAWBhaLc+sSqSaValFb/vB/hObNmztCukjDhg2rpCgRkRIKC2HpUvi//4Pffjt3PjxcIS11WqVa1Lm5uZw4cQLDMIiIiODs2bMEBwcDZms7KyurWooUkXouKckcMHb4sHm8bx906uTUkkRqSqWCeubMmRT8bxGB3r17ExAQAEB8fDzjxo1j7ty5VV+hiNRvmzbBggVgtYKvLzzwAMTGOrsqkRpjMQzDcHYRVSUjI4OgoCDS09MJDAx0djkicilyc+Hf/4ZffjGPL78cHnoIQkKcW5dIFahMXlXqO+q8vDyeffZZ4uLieO+994rd9/333/P5559XvloRkdLs22eGtJubuZHG2LEKaamXKtWinjRpEgMGDKBz586sXLmSgwcP8pe//AUAwzDo2LEju3btqvBzbd261XHcu3dvxo4dS1paGq+99hoRERGkpaUxfvx43Nwq9veEWtQidcyXX0LHjtCqlbMrEalSlcmrSn1HHR4eTufOnQHo27cvX331FZs2beK6667DYrEQW4nvjfr06cPzzz9f4vyUKVOYOHEioaGhbNmyhXfeeYcnnniiMmWKSG2Umgqffw5DhkDRP1x33eXcmkRcQKW6vv38/Iod9+vXj/j4eFJTUwHwuMQ9XlNSUsjNzSU0NBSA2NhY1hftIysiddeWLfDCC7Btm/m9tIg4VHrBk9TUVLKysmjevDkAjz76KG+88QaDBw+u1AunpKTw5ptvcvz4cWw2G+PHj2fHjh10Om/KRUhICMnJyY7wFpE6xGqFzz6DjRvN45Yt4Z57nFuTiIupVFAPGzaMyZMnk5qa6hhMZrFYGDNmDAsXLnRM3aqILl26EBcXB8C2bdsYOXIkQ4cOJeS8wSKNGjXi1KlTpQa11WrFarU6jjMyMirz64iIMyUmwqxZcPq0uWBJ377Qrx+ct5CSSH1XqaA+depUmet5Dx48mB49elT4uSIiIhy3Y2JiiIyMJD8/v0TYpqSk0KRJk1KfY9q0afzjH/+o8GuKiIvYtQveecdcbSw42Jx21aaNs6sScUmVCuo//elP9OvXr9hSokXsdjsrV67kl6I5j+U4v/sczIFqHTp04MMPPyxxbVnd3uPHj2fs2LGO44yMDKKioir664iIs7RuDY0aQdOm8OCDcN74FxE5p1JBPW7cOI4cOQJAWFgY7dq1c9xnGAanT5+u0PPs2LGD9evXM2nSJABsNhvx8fFMmDABHx8fUlNTCQkJIT4+nu7lbP7u7e2Nt7d3ZX4FEXGWAwfMaVYWC3h7w3PPmVtTap1ukXJd9MpkSUlJ7NmzB8MwyMvLIzo6mlatWuFZwc3aly5dSkpKCoZhkJCQwMCBA7nmmms4e/Ys06dPp2nTpiQnJzNu3LgSm3+URfOoRVxQQQH85z+wdi3cey/cfLOzKxJxumqbR10kJyeH+Ph41q5dS3x8PA0aNKB79+48/vjjFX6Ou+++u9TzwcHBpc6vFpFa6MQJ+Ogj838BNOBTpNIqFdQTJkxwBHO3bt0YNGgQU6dOdXxn/f3333PrrbdWS6EiUosYBqxbB4sWgc1mLmAyYgR06ODsykRqnUoFdXx8PM8995xjYZOcnBx+/vlnAAoLC3n77bcV1CL1XWYmzJsHO3aYxx07miH9v932RKRyKhXUn3zyCQkJCaSkpHDTTTfRoEGDYvefv1iJiNRDqamwcyd4eMCAAdCrlwaMiVyCSgX1p59+ypkzZ4iIiGDWrFnMnj3bsSc1QMOGDau6PhGpDQzjXBhHR5tTrpo3h2bNnFuXSB1QqaBOSkpyDPQaMGAAixcvZuTIkdVSmIjUEqdPw9y55mYaRcF8ww3OrUmkDqnUphytW7d23A4LC8PHx6fKCxKRWsIwzDW6X3zRnCOtzTREqkWlgjovL6/Y8fm7aa1bt+7SKxIR15eTY067mjcP8vOhXTt4+GFnVyVSJ1VqwZOgoCB69OiBxWLBMAwyMzMdE7ULCwtZu3YtOTk51VbshWjBE5EasH+/uZnG2bPg5gZ33w19+mjAmEglVNuCJwsWLODOO+8s8/6VK1dW5ulEpLbZvx9ef93s9g4Lg1GjoEULZ1clUqdd9BKirkgtapFqZrfDjBnmhhqDB5trdotIpVX7EqIiUo9s22YuWuLpaXZ1P/GEeVtEakSlBpOJSD2Slwcffwzvv28uBVpEIS1So9SiFpGSDh2CmTMhOdkcJBYYWHxRExGpMQpqETnHbodvv4UvvzRvN2pkDhi77DJnVyZSbymoRcSUlmZOu9q3zzy+5hoYOhR8fZ1alkh9p6AWEZNhwPHj5kju+++H665TV7eIC1BQi9RnhYXg7m7eDg6GRx6BkBBzjrSIuIQKBfXbb79NVFQUnhUY7RkSEkLXrl2x6C9xEdd29KjZ1X3PPXDllea5du2cW5OIlFChoA4JCSE9PR0PDw8utD7KnDlz8PX1Zc6cOVVRn4hUNcOA1athyRKw2WDZMujUSd3cIi6qQkE9ZMgQDMPgww8/JDY2Fje3c9OvCwsLycvLo0uXLnh7e5OTk8MDDzxQbQWLyCXIyIA5c2DXLvO4c2cYNkwhLeLCKvwddUpKCsnJyaxbt87RrW0YBgUFBeTm5uLj48O1117L4MGDadCgQbUVLCIXaedOM6QzM81FSwYNgh49FNIiLq5CQW2z2Vi8eDF9+/Yt0fVdUFBAdnY2V155JVarlfHjx9O3b99yN+8QkapjtxvsO5NJek4BQb6etAkLwM3tvPA9dgzeesu83bQpjB4NkZE1X6yIVFqFgjo7O5upU6fy9NNPl7jPbreTl5dHo0aN+PDDD4mJiVFIi9SQLYmpzP0pkYQzWeTbCvHycKd1mD/Db4gmNjrk3IXNmsGNN5ot6Xvu0TKgIrVIhXfP6tWrF2vXrmXEiBGEh4cTEBCAt7c3QUFBNGvWjLNnz9KuXTtiY2Oru+YyafcsqU+2JKYydcVu0nIKCAvwxsfTnbyCQpKyrAT5ePBSeAbtbu0GQUHmA7QEqIjLqJbds4q+l37mmWfw8vLC19cXq9VKRkYG+/btY/PmzWzZsoVWrVoRHBx8ab+BiJTLbjeY+1MiaTkFtGjk6/j/p5+3Bw2NfNp9/x9S0o5gP7oXtzFjzIBWSIvUShXu+k5LS2PIkCHFzhc1xg3D4Omnn6ZTp0688sor/OUvf6Fx48ZVX62IALDvTCYJZ7IIC/AutmZBk5OH6f7jcjyzMjhrt3CsaSuaO7FOEbl0FQpqPz8//vvf/3Lo0CF++eUX3N3d6dmzJ02aNClx7eTJk5k/fz7Dhg2r8mJFxJSeU0C+rRAfT28ALPZCrtq2no47N4EB6Q0bsbDzbTS+pjvN1ZIWqdUqFNS5ubn06dOHhx56iJEjR3L69GnWrl3L4cOHKSwspG3btrRo0QJfX18yMjJo27ZtddctUq8F+Xri5WF+Jx1amEuvNf+hUfJJAPa3uYp1V95EeqEbQb4aNCZS21UoqHfv3s3ixYsJDw/n9OnTDBgwgOeee45x48ZRUFDArFmzeOmll7j99ts5ePAgLVu2pGvXrtVdu0i91SYsgNZh/uw6kU5ggBeeBVbyvb35+fo7SIxuy8nUHDpG+tMmLMDZpYrIJarwqO8i+/fvJzo6Gi8vL8e5ou+wmzZtWuUFVoZGfUu9kZfHllPZTP16D+m5BbSxZWDx9SXVy88c9d3Akwl3ti8+RUtEXEZl8sqt3Hv/IC0tDYDLL7+8WEiD+R12UUhbrVb2799fyZJFpMIOHIDnnyd2/1Ym3NmeDpFBJPoEsy/fg4w8Gx0jgxTSInVIhbq+X331VVatWsWCBQtYtmwZV111leM+wzDIy8sjOzubbt26cfvtt3PPPfeUujiKiFwCux2+/hq++sqcE71xI7G9ehFz31UXXplMRGqtCnV95+TkEBcXxzvvvEOfPn148skniy0lWrQxx6lTp7jvvvu46aabqrXosqjrW+qslBRzS8oDB8zj666D++8HHx/n1iUiF6XKFzzx9TUXVGjTpg0tW7Zk6NChHDp0iMsuuwwvLy8CAwOxWCz8/PPPXH/99VXyS4jI/2zeDJ98Arm5ZjAPHQpduji7KhGpIRVemWz79u1MnTqVwsJCEhIS+PTTT/H39yc7O5vk5GTy8vIYOnSoglqkKqWkwOzZUFgIrVrBqFEQGursqkSkBlV41HefPn347rvv6N27N2vWrCn1mgULFnDmzBnGjBlTlTVWmLq+pU5atQpycqBfP3Cr8PhPEXFhVd71nZ+fT1ZWFocOHcJisZRYdcxut1NYWMiwYcPw8fFh+/btdO7c+eJ/A5H6yjDg+++hfXuIijLP3XKLc2sSEaeqUFAXFhby4osv0rJlS1avXk1ubi5z587lscceK/X6Dz74QEEtUllpafDxx7BnD4SHw//9n7ajFJGKBfWzzz7L999/z+7du3FzcyMxMZFnnnmG3NzcUqdhde3aFavVire3d5UXLFInbd8Oc+dCdjZ4ecFtt4FHhYeQiEgdVqEvvP72t7/h5ubGlClTMAwDT09PevToQd++ffnpp59KXH/VVVcppEUqoqAA/v1vePddM6SjomDiROjWTdtSighQwRZ1YGAgnTp1YsSIEXTt2pXrrruOX3/9leeffx7DMHj77bcBc/ETwzC4+uqree6556q1cJFaLyMD/vUvOGlupkGfPhAXp5a0iBRTqX8Rjhw5wgcffEBMTAz79+9nwYIFJa45dOgQQ4YM4emnn8ZT36+JlC0gAIKCzJb0Qw+ZA8hERM5ToaDOyckhIyODnj17AudWIitNYGAgb7/9tkJapDSZmeDtbX4PbbGY86ItFjO0RURKUaGg9vf358MPP3Qc5+Tk8Mgjj5R6baNGjWjUqFHVVCdSl+zaBXPmwFVXmauLAWi+v4hcQIWCOiAggIA//MUfEBDA/fffX21FidQpNhssWWIuXAKQkABWq9myFhG5AI1aEalOp07BzJlw9Kh53LMnDByo+dEiUmEuEdT33nsvn3/+OQCTJk1i69atjvt69+7N2LFjnVWayMX53zaUfPYZ5OeDnx+MGAFXXunsykSklnF6UK9atYrk5GTHcZ8+fXj++eedWJFIFcjJgS++MEO6fXszpBs2dHZVIlILOTWoDx8+TEhICL6+vs4sQ6Tq+fnB8OFw+jTceqsWLxGRi+a0oLbb7WzZsoUBAwYUO5+SksKbb77J8ePHsdlsjB8/nlBt6ycVYLcb7DuTSXpOAUG+nrQJC8DNrYYCsrAQli+HFi3MUd0AWu9eRKqA04L6yy+/pG/fviXOd+nShbi4OAC2bdvGyJEjWb58eanPYbVasVqtjuOMjIzqKVZc3pbEVOb+lEjCmSzybYV4ebjTOsyf4TdEExsdUr0vnpRkDhg7fNhsSbdpA+olEpEq4pTNbffs2UPTpk1L7fKOiIhw3I6JiSEyMpIjR46U+jzTpk0jKCjI8RNVtC2g1CtbElOZumI3O4+nE+jjQbNgXwJ9PNh1Ip2pK3azJTG1+l580yZ44QUzpH19zfnRCmkRqUJOaVH/+OOP7Nq1i08//RSA33//nTFjxnD11VfTs2dPmjdv7rg2PDyclJSUYueKjB8/vtiI8IyMDIV1PWO3G8z9KZG0nAJaNPLF8r/vgv28PfD1cicxNYd5PyUSExVctd3gubnmZhq//GIeX365uQxoSDW33kWk3nFKUD/88MPFjvft28eMGTP44YcfmDNnDpMmTQLAZrMRHx/PhAkTSn0eb29v7dJVz+07k0nCmSzCArwdIV3EYrHQ2N+b/Wey2Hcmk3ZNqmgVsNxcePFFSE4GNzfo3x9uv928LSJSxZw+PWvRokVs3LiRt956ixEjRpCWlsasWbMwDIOEhAQmTZqkMJYypecUkG8rxMez9M+Ij6c7yVlW0nMKqu5FGzSAK64wlwQdPRpataq65xYROY/FMAzD2UVUlYyMDIKCgkhPTydQayjXC3tOZTD2s+0E+njg513y785sq42MPBv/uq9zuS3qC44YP3vWnGJVNBc6P98c6d2gQRX/RiJSH1Qmr5zeoha5FG3CAmgd5s+uE+n4erkX6/42DIOkLCsdI4NoE1b27lQXHDG+dSvMnw/Nm8OYMWZge3nVwG8nIqKgllrOzc3C8BuimbpiN4mpOTT298bH0528gkKSsqwENfBk2A3RZQ4kKxoxnpZTQFiANz6e3uQVFLLrRDr/XLqdf9r30HJPvHlxXp65d7S/f839giJS7ymopdaLjQ5hwp3tHa3i5CwrXh7udIwMYlg586jLGzHeLDOJzl8t5oQ9mxatQ7H07WsOGnN3r8lfTUREQS11Q2x0CDFRwZVamazUEeOGwRW//8rVW36g0GbjhKcvhx58hFbdY2voNxERKU5BLXWGm5ulUlOwShsx7l5YyOX7tmOx2zke3ZZF7XvSPLIFGtctIs6ioJZ6K8jXEy8P8/vsohHjhR4erL8pjsZJx9kW3Qm7tZAgX+0dLSLOo6CWeqtNWABtQrwJ/fYrGoSH8nun6wE4GxJOanAYSak5FxwxLiJS3RTUUm+5nTrJc1u/YOfh3eQesvB707YUNgyu8IhxEZGaoKCW+scwYN06WLyYyIIC3K6I4uPWN3HS3Zf8szkVGjEuIlJTFNRSv2Rlwdy5sGOHedyhA01GjOBv/gH8yVl7WYuIlENBLfWHzQYvvQQpKeDhAffcA717g8WCG1Tdph0iIlVIQS31h4eHGcwbNpibaTRr5uyKREQuSEEtddvp01BQcC6Ub74ZbroJPDXlSkRqBwW1VKkL7kJVUwwDfvoJFi40d7yaOBG8vc0NNRTSIlKLKKilylxwF6qakpMDn34Kmzebx8HB5raU2tdcRGohBbVUifJ2oZq6YjcT7mxfM2G9fz/Mng2pqeDmBnFx0KePeVtEpBZSUMslK28XKl8vdxJTc5j3UyIxUcHV1w1ut8OKFeaPYUDjxuaAsRYtquf1RERqiIJaLlmpu1D9j8ViobG/N/vPZLHvTGb1TYGyWODAATOkr78eBg8GH5/qeS0RkRqkoJZLVtouVH/k4+lOcpaV9JyCqn9xu93s1rZYYORIs+v7mmuq/nVERJxEX9zJJfvjLlSlySswB5ZV6S5UeXkwZ445aMxRSJBCWkTqHLWo5ZK1CQugdZg/u06k4+vlXqz72zAMkrKsVbsL1eHDMHMmJCWZLelbboGIiKp5bhERF6MWtVwyNzcLw2+IJqiBJ4mpOWRbbRTaDbKtNhJTc6puFyq7Hb75Bv75TzOkQ0Lgr39VSItInaYWtVSJ2OgQJtzZ3jGPOjnLWrW7UKWlmdOu9u41j6+5BoYOBV/fS65dRMSVKailysRGhxATFVz1K5MZBkyfDqdOmYuWDB5sjuy2aHcrEan7FNRSpdzcLFU/BctigQED4KuvYNQoCA+v2ucXEXFhCmpxTUePQkYGdOhgHl95JXTqpFa0iNQ7CmpxLYYBa9bAF1+AlxdMmmSu1Q0KaRGplxTU4joyMsy50bt2mccdOminKxGp9xTU4hp27jRDOjPTDOdBg6BHD7WiRaTeU1CLcxkGLFoEq1ebx02bmptpREY6ty4RERehoBbnslig8H9Lj/bqZY7uVne3iIiDglpqnmGA1Xpud6uBA+Gqq6B9e6eWJSLiirSEqNSs7Gx4/3145x1zSVAwW9AKaRGRUqlFLTVn715zGdC0NHB3hyNHoEULZ1clIuLSFNRS/QoLYflyc0MNwzBXFnv4YYiKcnZlIiIuT0Et1evMGZg1y9yaEuDGG82pV97eTi1LRKS2UFBL9TEM+PhjM6R9feHBB+Hqq51dlYhIraKglupjsZhbUf7nPzBs2LmlQEVEpMI06luq1oED8OOP546bNYOnnlJIi4hcJLWopWrY7fD117BihXkcFaUR3SIiVUBBLZcuNdUcMJaQYB537QpNmji3JhGROkJBLZdm82b45BPIzTVXGhsyxAxqERGpEgpquXgLFsC6debtli3NzTRCQ51bk4hIHaOglosXHm6O7O7bF/r1M1cbExGRKqWgloozDEhPh4YNzePevaFNG60wJiJSjVxieta9997ruJ2WlsbEiRN55513mDp1KvaijRvEudLSYMYMeP11yMszz1ksCmkRkWrm9Bb1qlWrSE5OdhxPmTKFiRMnEhoaypYtW3jnnXd44oknnFihsH07zJ1r7nzl5WVuptGmjbOrEhGpF5zaoj58+DAhISH4+voCkJKSQm5uLqH/G5AUGxvL+vXrnVli/VZQAP/+N7z7rhnSUVEwYYJCWkSkBjmtRW2329myZQsDBgxwnNu+fTudOnUqdl1ISAjJycmO8JYacuwYzJwJJ0+ax7feCnffDR5O74QREalXnPav7pdffknfvn2LnTtz5gwhISHFzjVq1IhTp06VGtRWqxWr1eo4zsjIqJ5i6xG73WDfmUy8P/43DQ8k0jC8EZaRI6FDB2eXJiJSLzklqPfs2UPTpk0dXd5FwsLC+P3334udS0lJoUkZq1xNmzaNf/zjH9VWZ32zJTGVuT8lknAmCw+3K+hVmMqJK+5ksH8Esc4uTkSknrIYhmHU9It+9NFH7Nq1y3H85ZdfctdddxETE8NPP/3EBx984Lhv0KBBLFq0qNTnKa1FHRUVRXp6OoGBgdX3C9RBu1b9xFef/8C3bW4gLMAbH0938goKScqyEtTAkwl3tic2OuTCTyQiIheUkZFBUFBQhfLKKS3qhx9+uNjxvn37mDFjBgBbt24lNTWVkJAQ4uPj6d69e5nP4+3tjbe3d3WWWvfZbNi/WELaBwu5IjufnKbNORbaFgA/bw98vdxJTM1h3k+JxEQF4+ZmcXLBIiL1i9NHBi1atIiNGzfy1ltvMWLECKZMmcL06dNp2rQpycnJjBs3ztkl1l2nT8NHH5G+9wDpuQUc7nANJ5u2KnaJxWKhsb83+89kse9MJu2aqKdCRKQmOaXru7pUpiuhXjMM2LgRPvsM8vM5aXNjcsNY8jtciXspLeZCu8GxszlM/VMnurZq5ISCRUTqFpfv+pbqVTRyOz2ngCBfT9qEBRTvsl6wAIrmp7drR+YdAzj2TSKBBYX4eZf8SOQVFOLl4U6Qr2cN/QYiIlJEQV3H/HHkdr7NDNjWYf4MvyH63GCwzp3NFnVcHPTpQ2sDWm9NYdeJdHy93LFYzoW6YRgkZVnpGBlEm7AAJ/1WIiL1l0us9S1VY0tiKlNX7Gbn8XQCfTxoFuxLoI8Hu4+l8v4n69iSmGpe2LEjTJ0Kt90GFgtubhaG3xBNUANPElNzyLbaKLQbZFttJKbmENTAk2E3RGsgmYiIE6hFXUfY7QZzf0okLaeAFo18Ha3i8PwsBvy6DLfTp1jcuCExo3uagRscXOzxsdEhTLizvaM1npxlxcvDnY6RQQz7Y2tcRERqlIK6jth3JpOEM1mEBXg7QrrlgZ1ct+lbPAryyfH0JPXQsXJHbsdGhxATFVz+99siIlKjFNR1RHpOAfm2Qnw8vfEosHLdz9/S8qC5qExSWFN+6H4XBwo8Sc8pKPd53NwsmoIlIuJCFNR1RJCvJ14e7gSeOEKfX1bgn5kOFtjeuTu/XdmNrAI7XoZNI7dFRGoZBXUd0SYsgNZh/jTe/D3+mWlk+zfkxx53kRTWTCO3RURqMQV1HVE0cvvl5JspcPfkcGx33Pz8yLPaHOt1a+S2iEjto+lZtd3WrfDuu2C3Exsdwri7OpF2Wz+S7R4cO5tDRp6NjpFB2lRDRKSWUou6trJaYdEi+PFH83jjRrjxRo3cFhGpYxTUtdHRo/DRR+amGhaLuXDJ9dc77tbIbRGRukNBXZsYBqxeDV98AYWF0LAhjBwJ7do5uzIREakmCura5LPPYO1a8/ZVV8GwYeDn59SSRESkeimoa5MePeDXX+Huu+HGG81ubxERqdMU1K6soAASEqB9e/M4MhJeegl8fJxbl4iI1BhNz3JVJ07AtGnw5ptw6NC58wppEZF6RS1qV2MYsH69OfWqoAACAsypWCIiUi8pqF1JVhbMmwfbt5vHHTrAiBEQqKlWIiL1lYLaVezZAx9/DGlp4OEB99wDvXtrwJiISD2noHYVJ0+aId2kCYweDVFRzq5IRERcgIK6GtjtRsWW8DSMcy3mnj3N29dfD97eNVqviIi4LgV1FduSmMrcnxJJOJNFvq0QLw93Wof5M/yG6HObYhgG/PyzucrYs8+aI7ktFjOsRURE/kDTs6rQlsRUpq7Yzc7j6QT6eNAs2JdAHw92nUhn6ordbElMhZwcmDkT5s6FY8fghx+cXbaIiLgwtairiN1uMPenRNJyCmjRyBfL/7q0/bw98PVyJzE1h6+/+JGrT/+M5exZcHODuDjo08fJlYuIiCtTUFeRfWcySTiTRViAtyOki7gZBrcd/C8d4zeQ1iqE4Oim5oCxFi2cU6yIiNQaCuoqkp5TQL6tEB/PkgPBrtq2jg67fibbbie109UEP/mwVhgTEZEK0XfUVSTI1xMvD3fyCgpL3Lf7ii6k+gWz6vp+FAwbrpAWEZEKU1BXkTZhAbQO8ycpy4p7fh6t92933Jfr48t7PR/AuLYLbcICnFiliIjUNur6riJubhaG3xDNh/PWcOPKZTTJzyTf4s7uqHYkZVkJ8vNm2A3Rpc+nFhERKYOCuqrY7cTu/pUX965gn5HDcS8/Dto8yciz0TEyiGF/nEctIiJSQQrqqpCWBrNnw969NPbzotGgPuzvE0c0nuWvTCYiInIBCupLtXOnGdLZ2ebSn4MH43b99bTVZhoiIlIFFNSXymIxQzo6GkaNgvBwZ1ckIiJ1iIL6YuTng5eXebtDB/jLX+CKK8ztKUVERKqQpmdVhmGYG2n8/e+QknLu/JVXKqRFRKRaKKgrKjMT3n4bPv/cvP3jj86uSERE6gE1Ayti1y6YMwcyMsyW86BBcNNNzq5KRETqAQV1eWw2WLIEVq0yjyMjzc00mjZ1bl0iIlJvKKjLs3r1uZDu2RMGDgRPT6eWJCIi9YuCujw33wy7d0Pv3uaAMRERkRqmoC6PhweMGePsKkREpB7TqG8REREXpqAWERFxYQpqERERF6agFhERcWFOG0y2d+9e1qxZg8Vi4cCBAzz00EO0b9+eSZMmsXXrVsd1vXv3ZuzYsc4qU0RExKmcFtQvvfQSc+fOBcBqtfLAAw+waNEi+vTpw/PPP++sskRERFyK07q+W7Zs6bjt7e1NVlaWs0oRERFxWU5rUU+ZMsVxe8OGDXTr1g2AlJQU3nzzTY4fP47NZmP8+PGEhoY6qUoRERHncuqCJ1999RUbNmzgwIEDvPHGGwB06dKFuLg4ALZt28bIkSNZvnx5qY+3Wq1YrVbHcUZGRvUXLSIiUoOcOuq7X79+vPzyy8yfP59HH32U/Px8IiIiHPfHxMQQGRnJkSNHSn38tGnTCAoKcvxERUXVVOkiIiI1wilBfejQIZYsWeI49vHxoV27dmzevLlEKIeHh5OSklLq84wfP5709HTHz9GjR6u1bhERkZrmlK5vm83Gjh07+NOf/uQ4d+LECdLT01m1ahWTJk1yXBcfH8+ECRNKfR5vb2+8vb0dx4ZhAOoCFxER11aUU0W5VR6LUZGrqsHKlSs5cuQIHh4enD59mu7du9OjRw+WLl1KSkoKhmGQkJDAwIEDueaaayr0nMeOHVP3t4iI1BpHjx6lWbNm5V7jtKCuDna7nRMnThAQEIDFYiEjI4OoqCiOHj1KYGCgs8urNfS+XRy9bxdH71vl6T27OK70vhmGQWZmJpGRkbi5lf8tdJ3a5tLNza3Uv0wCAwOd/h+lNtL7dnH0vl0cvW+Vp/fs4rjK+xYUFFSh67TWt4iIiAtTUIuIiLiwOh3U3t7eTJ48udjIcLkwvW8XR+/bxdH7Vnl6zy5ObX3f6tRgMhERkbqmTreoRUREajsFtYiIiAtTUIuIiLiwOjWPeu/evaxZswaLxcKBAwd46KGHaN++PZMmTWLr1q2O63r37s3YsWOdWKnruvfee/n8888BSEtL47XXXiMiIoK0tDTGjx9/wYn59dEf3zN91spX1vujz1r5ynrf9Hm7sDlz5uDn50dubi4hISH069ev9n3ejDpk2LBhjtt5eXnGwIEDDcMwjB9//NFZJdUq33//vdGrVy/H8VNPPWUkJSUZhmEYmzdvNt58801nleayzn/P9FkrX1nvjz5r5SvrfdPnrXyffPKJsWfPHsfx/fffbxhG7fu8ufCfEJXXsmVLx21vb2+ysrKcWE3tcvjwYUJCQvD19QUgJSWF3NxcQkNDAYiNjWX9+vXOLNHlnP+eycXRZ02qy759+2jbtq3jeMGCBbXy81anur6nTJniuL1hwwa6desGmP8QvPnmmxw/fhybzcb48eMd/5HEXCN9y5YtDBgwwHFu+/btdOrUqdh1ISEhJCcn672j9PcM9Fm7kNLenx07duizdgFlfa70eSvb7t278fLyYtmyZWzcuJHMzEzGjRvHgQMHat3nrU4FNcBXX33Fhg0bOHDgAG+88QYAXbp0IS4uDoBt27YxcuRIli9f7swyXcqXX35J3759i507c+YMISEhxc41atSIU6dOueyHuSaV9p6BPmsXUtr7M3ToUH3WLqCsz5U+b2U7dOgQW7du5Y477iAuLo7k5GRGjRrF4MGDa93nrU51fQP069ePl19+mfnz5/Poo4+Sn59PRESE4/6YmBgiIyM5cuSIE6t0HXv27KFp06Ylum/DwsJITU0tdi4lJYUmTZrUZHkuqaz3DNBn7QJKe3/y8/P1WbuAsj5X+ryVLTs7m0GDBhETEwNAaGgozZs3x26317rPW50J6kOHDrFkyRLHsY+PD+3atWPz5s0lPrjh4eGkpKTUdIku6ccff+TTTz9lzJgxjBkzht9//50xY8Zw9OhRfvvtt2LXpqamuuxfnDWprPds3rx5+qyVIzU1tdT3p0OHDvqslaOs9y0pKUmft3K0atWKY8eOFTtXlAu17fNWZ4LaZrOxY8eOYudOnDhBeno6c+bMKXZdfHw8V1xxRQ1X6JoefvhhZsyY4fhp164dM2bMYPjw4fj4+Dj+8oyPj6d79+5OrtY1lPWeNW/eXJ+1cuzYsaPU96djx476rJWjrPctOTlZn7dyxMbGkpCQgM1mc5xLSUkhJiam1n3e6tRa3ytXruTIkSN4eHhw+vRpunfvTo8ePVi6dCkpKSkYhkFCQgIDBw7kmmuucXa5LmfRokWMHj2aF198kREjRmCz2Zg+fTpNmzYlOTmZcePG4e7u7uwyXcr579nq1av1WStHWf9fPHv2rD5r5SjrfdO/beU7ePAgs2fPJioqiszMTPr27UuHDh1q3eetTgW1iIhIXVNnur5FRETqIgW1iIiIC1NQi4iIuDAFtYiIiAtTUIuIiLgwBbWIiIgLU1CL1HK5ubnOLkFEqpGCWqSWe/HFF1m2bFmxc8ePH7/g4yZOnEhycnJ1lSUiVaTO7Z4lUpctW7aMgwcP4unp6Tj3888/4+/vz9GjRwEwDIO3336bMWPG8Oc//xkwl5fcs2cPHTt2dDyuQ4cOfPvttwwdOtRxbuPGjVx22WUuvUGBSH2jlclEahHDMLBYLKSlpdGwYUMARo0axaxZsxz3p6WlERwcXOxxW7duZeTIkSXOn2/v3r307duX2bNnl3vdV199xZAhQ3j11Vdp0KABCQkJhIaG8sADD/DAAw/QuXNnpk2bBsDixYt5+eWX+eKLL4iIiGDevHlYLBZycnI4deoUf//73/H19eXw4cOMHj0aHx8fACwWCx9++GGxHaJE6iMFtUgtNGbMGOLj4wEzXNu2bQtARkYGAP/973+LrV3cr18/3nzzTVq1asX+/ftp3bo1FouF119/nfvuu49mzZoBMH78eF544QU8PC7c2XbnnXeyYsUKx/HEiRPp378/ubm5/POf/2TlypUA/Otf/yI9PZ1//OMf/P3vf2fs2LGOnYri4+P55JNPeO211zh8+DAeHh6OWkTEpO+oRWohHx8f3nzzTZYuXcru3btZunQpS5cuZeXKlVx55ZXFQjojI4NbbrmFVq1aAeDh4cG9995LUlISK1eudHRzFxQUMHbs2AqFNJgt3j+KjIwkJSUFDw8PGjRoQFZWVolrU1JSip2/6qqr2Lt378W9CSL1hL6jFqmlNm3axMGDB4udKygoKHFdYGAgY8aMIScnh2XLlnH//fczatQokpKSCAsLcwSzp6cnjRs3vqhaTp8+zerVq5k3bx7btm2jW7dubNy4kZiYGMLCwhwt/Q8++KDY477++mtuv/12x/F3333HyZMnSUxMpG/fvvzpT3+6qHpE6hIFtUgtNXv2bMf3uUXsdruj5Xw+X19f/Pz8+POf/8x7770HUGyv3srKzMxk6dKl5OTkkJuby+zZs/Hz8wOgZ8+eLF68GB8fH9q1a0dCQkKJxy9fvpzDhw/zxBNPAGaLPC4ujkaNGmGz2Rg1ahTBwcH07NnzomsUqQsU1CK11MKFC2nRokWxc3l5eTz22GPFzlmtVqxWK4GBgdx11138cVhKfn5+sWvffvttrrzySnr06HHB1w8ICODuu+8u9b6YmBgmTpzIbbfdVuI+u93OBx98QOfOnenfv7/jvJeXF40aNQLM7vmXX36ZCRMmKKil3lNQi9RSS5cudQzKKlLU9W21Wlm9ejV33HEHubm53HLLLfj7+zuumz59Onl5eRw7dqxEEO7YsaNCQV0eNzc3GjRoQF5eXrHXNQyDGTNmMGzYsBK1b9u2jZiYGMdxaGgoaWlpl1SHSF2goBapJX7++WeeffZZPDw8OHjwIJGRkaV2fe/bt4+rr76aPXv2EB8fT6dOndi8eXOJ55s5cybu7u6MHDnyouqx2+3lnu/WrRvR0dFkZWU5WvELFy5k8ODBjpC22+2sWrWKPn368Le//Y2vv/7a8Z35ihUr1JoWQdOzRGoNu92O1WqlQYMGgLl06OrVq+nXrx8Aa9eupXHjxsUWNSnLzp07ee+99+jfvz8//vgjHTt2ZODAgcUWUinP119/zT333MPKlSvp1auX4/zZs2cZNmwYjz/+OD179sRisfDkk0+yZcsWlixZwrhx4xxBXFhYyObNm5k8eTKDBw/m5MmTfPHFF/j5+ZGenk5GRgYTJkzAzU2TU6R+U1CL1EJ79uxh7NixdO3alb/+9a/4+flhtVpZtGgRc+bMoW/fvjzzzDMlHldYWMj8+fM5duwYf/vb3xzBPH/+fN555x3eeustrr322pr+dUSkHApqkVrk119/5ddff6WgoIDRo0cTEBBQ4hqr1cpTTz3F448/7mhdnz59mlWrVpGdnU3fvn2Jiooq8biEhAQGDx5caje5iDiPglqkFklJSXGMjK6Mn3/+mS5duhRbCKU0SUlJFz2XWkSqh4JaRETEhWmUhoiIiAtTUIuIiLgwBbWIiIgLU1CLiIi4MAW1iIiIC1NQi4iIuDAFtYiIiAtTUIuIiLiw/w/0PCw4Wb3ufAAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 500x400 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "# 修复中文显示与负号问题\n",
    "plt.rcParams['font.sans-serif'] = ['AR PL UMing CN']  # 若无 SimHei，可改为 ['Arial Unicode MS'] 或系统中文字体\n",
    "plt.rcParams['axes.unicode_minus'] = False\n",
    "plt.figure(figsize=(5,4))\n",
    "plt.scatter(y_test, y_pred, alpha=0.7)\n",
    "plt.xlabel('真实 PM25')\n",
    "plt.ylabel('预测 PM25')\n",
    "plt.title('真实 vs 预测 (回归)')\n",
    "# 参考 y=x 线\n",
    "lims = [min(y_test.min(), y_pred.min())-1, max(y_test.max(), y_pred.max())+1]\n",
    "plt.plot(lims, lims, 'r--', alpha=0.6)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cb5b1fb4",
   "metadata": {},
   "source": [
    "### 小练习 4\n",
    "1. 计算预测残差 (y_test - y_pred) 的平均值与标准差。\n",
    "2. 观察是否存在系统性偏差 (平均值显著非 0)。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b00159e",
   "metadata": {},
   "source": [
    "## 8. 小结\n",
    "流程核心：特征准备 → 划分训练/测试 → 训练 (fit) → 评估 (R²/MAE/Accuracy) → 解释/复盘。 \n",
    "\n",
    "今日要点：\n",
    "1. 构造时间相关特征：lag (按分组 shift) 是时序常用基线做法。\n",
    "2. 回归评估双视角：误差 (MAE) + 拟合度 (R²)；与“预测均值”基线比较判断是否有价值。\n",
    "3. 分类起步看 Accuracy，同时意识到类别不平衡隐患。\n",
    "4. 系数只表相关方向与线性近似，不等于因果；特征相关性会影响稳定性。\n",
    "5. 过拟合概念：训练好/测试差；简单线性模型示例中尚不突出，但需保有警觉。 \n",
    "\n",
    "实践提示：\n",
    "- 先做“无模型”基线（均值 / 频率），再上模型验证增益。\n",
    "- 固定 random_state 保障可复现；保留特征列表以便快速做 ablation（删减对比）。\n",
    "\n",
    "展望：Day 8 综合项目中，若加入模型务必注明：数据范围、特征来源、阈值假设与局限。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "5b5e4f85",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MAE: 0.58  R2: 0.982\n",
      "中位数基线 MAE: 13.12\n",
      "残差均值≈0? -0.506   标准差: 0.81\n",
      "CV MAE 平均: 0.95\n"
     ]
    }
   ],
   "source": [
    "# 模型进阶代码示例 (基线 + 残差 + 交叉验证)\n",
    "import pandas as pd, numpy as np\n",
    "from sklearn.model_selection import train_test_split, cross_val_score\n",
    "from sklearn.linear_model import LinearRegression\n",
    "from sklearn.metrics import mean_absolute_error, r2_score\n",
    "df = pd.read_csv('../data/air_quality_timeseries.csv', parse_dates=['date']).sort_values(['city','date'])\n",
    "df['PM25_lag1'] = df.groupby('city')['PM25'].shift(1)\n",
    "model_df = df.dropna(subset=['PM25_lag1']).copy()\n",
    "X = model_df[['PM10','NO2','SO2','PM25_lag1']]\n",
    "y = model_df['PM25']\n",
    "X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=42)\n",
    "reg = LinearRegression().fit(X_train,y_train)\n",
    "pred = reg.predict(X_test)\n",
    "mae = mean_absolute_error(y_test,pred); r2 = r2_score(y_test,pred)\n",
    "print('MAE:', round(mae,2),' R2:', round(r2,3))\n",
    "\n",
    "# 1. 中位数基线\n",
    "baseline_med = np.full_like(y_test, fill_value=np.median(y_train))\n",
    "print('中位数基线 MAE:', round(mean_absolute_error(y_test, baseline_med),2))\n",
    "\n",
    "# 2. 残差分析\n",
    "residuals = y_test - pred\n",
    "print('残差均值≈0?', round(residuals.mean(),3),'  标准差:', round(residuals.std(),2))\n",
    "\n",
    "# 3. 简单交叉验证 (仅演示, 用全数据)\n",
    "scores = cross_val_score(LinearRegression(), X, y, scoring='neg_mean_absolute_error', cv=3)\n",
    "print('CV MAE 平均:', round(-scores.mean(),2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "116ca6a3",
   "metadata": {},
   "source": [
    "### 模型进阶与良好实践（入门视角）\n",
    "1. 随机种子：保证实验可复现 (random_state)。\n",
    "2. 基线多样化：回归可用中位数预测；分类可用多数类预测。\n",
    "3. 特征重要性：线性模型系数需标准化后再对比；树模型 (后续) 可看 feature_importances_。\n",
    "4. 数据泄漏警惕：不能用未来信息（如同日平均后再预测）。\n",
    "5. 评估切分：后续可用交叉验证 cross_val_score。\n",
    "6. 管道思想：Pipeline 封装预处理 + 模型，防止数据泄漏。\n",
    "7. 残差分析：残差 vs 预测值分布是否随机；系统模式提示欠拟合/非线性。\n",
    "8. 过拟合症状：训练指标显著好于测试；加入正则化可改善。\n",
    "9. 指标多角度：回归( MAE / RMSE / R² )；分类(Accuracy + Precision/Recall/F1)。\n",
    "10. 记录实验：表格化 (特征集/参数/指标) 便于比较。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d019e09",
   "metadata": {},
   "source": [
    "## 9. 课后作业提示\n",
    "提交 *homework_day7.ipynb*：\n",
    "1. 自行再增加 1 个特征 (如 PM10 - PM25)，重新训练回归并记录 MAE 与 R² 变化。\n",
    "2. 再做一次分类：修改阈值为 60，对比 Accuracy 有何变化。\n",
    "3. 将回归训练封装成函数 train_simple_reg(df, feature_list) 返回 (r2, mae)。调用两次比较。\n",
    "4. 写 120 字：说明你今天模型的局限（数据量、特征、评估方式、潜在偏差）。\n",
    "(可选) 5. 把线性回归预测结果按误差绝对值排序，列出误差最大的 3 条记录并猜测原因。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c139dfe4",
   "metadata": {},
   "source": [
    "---\n",
    "📌 提示：明天 Day 8 综合实践可选择是否加入“简单模型”一节，务必明确说明假设与局限。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "prophet",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
